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Abstract. In this paper we study a new, generalized version of the well- 
known group testing problem. In the classical model of group testing we 
are given n objects, some of which are considered to be defective. We 
can test certain subsets of the objects whether they contain at least one 
defective element. The goal is usually to find all defectives using as few 
tests as possible. In our model the presence of defective elements in a 
test set Q can be recognized if and only if their number is large enough 
compared to the size of Q. More precisely for a test Q the answer is yes 
if and only if there are at least a\Q\ defective elements in Q for some 
fixed a. 
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1 Introduction 

The concept of group testing was developed in the middle of the previous century. 
Dorfman, a Swiss physician intended to test blood samples of millions of soldiers 
during World War II in order to find those who were infected by syphilis. His 
key idea was to test more blood samples at the same time and learn whether 
at least one of them are infected f3]. Some fifteen years later Renyi developed a 
theory of search in order to find which electrical part of his car went wrong. In 
his model - contrary to Dorfman's one - not all of the subsets of the possible 
defectives (electric parts) could be tested [6 . 

Group testing has now a wide variety of applications in areas like DNA 
screening, mobile networks, software and hardware testing. 

In the classical model we have an underlying set [n] = {!,..., n} and we 
suppose that there may be some defective elements in this set. We can test all 
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subsets of [n] whether they contain at least one defective element. The goal is to 
find all defectives using as few tests as possible. One can easily see that in this 
generality the best solution is to test every set of size 1. Usually we have some 
additional information like the exact number of defectives (or some bounds on 
this number) and it is also frequent that we do not have to find all defectives 
just some of them or even just to tell something about them. 

In the case when we have to find a single defective it is well-known that the 
information theoretic lower bound is sharp: the number of questions needed in 
the worst case is [log n~\ , which can be achieved by binary search. 

Another well-known version of the problem is when the maximum size of a 
test is bounded. (Motivated by the idea that too large tests are not supposed to 
be reliable, because a small number of defectives may not be recognized there). 
This version can be solved easily in the adaptive case, but is much more difficult 
in the non-adaptive case. This latter version was first posed by Renyi. Katona 
[5] gave an algorithm to find the exact solution to Renyi's problem and he also 
proved the best known lower bound on the number of queries needed. The best 
known upper bound is due to Wegener [7j. 

In this paper we assume that the presence of defective elements in a test set 
Q can be recognized if and only if their number is large enough compared to the 
size of Q. More precisely for a test Q C [n] the answer is yes if and only if there 
are at least a\Q\ defective elements in Q. Our goal is to find at least m defective 
elements using tests of this kind. 

Definition 1. Let g(n,k,a,m) be the least number of questions needed in this 
setting, i.e. to find m defective elements in an underlying set of size n which 
contains at least k defective elements, where the answer is YES for a question 
Q ^ */ o.'>T-d only if there are at least a\Q\ defective elements in Q. 

We suppose throughout the whole paper that 1 < m < k and < a < 1. Let 
a = [ij , that is, a is the largest size of a set where the answer NO has the usual 
meaning, namely that there are no defective elements in the set. It is obvious 
that if a set of size greater than k/a is asked then the answer is automatically 
NO, so we will suppose that question sets has size at most k/a. All logarithms 
appearing in the paper are binary. 

It is worth mentioning that a similar idea appears in a paper by Damaschke 
[U and a follow-up paper by De Bonis, Gargano, and Vaccaro [5]. Since their 
motivation is to study the concentration of liquids, their model deals with many 
specific properties arising in this special case and they are interested in the 
number of merging operations or the number of tubes needed in addition to the 
number of tests. 

If A: = m = 1, then the problem is basically the same as the usual setting 
with the additional property that the question sets can have size at most a: this 
is the above mentioned problem of Renyi. As we have mentioned, finding the 
optimal non-adaptive algorithm, or even just good bounds is really hard even in 
this simplest case of our model, thus in this paper we deal only with adaptive 
algorithms. 



In the next section we give some upper and lower bounds as well as some 
conjectures depending on the choices of n, fc, a, and m. In the third section 
we prove our main theorem, which gives a general lower and a general upper 
bound, differing only by a constant depending only on k. In the fourth section 
we consider some related questions and open problems. 

2 Upper and lower bounds 

First of all it is worth examining how binary search, the most basic algorithm 
of search theory works in our setting. It is easy to see that it does not work 
in general, not even for m = 1. If (say) k = 2 and a = 0.1, then question 
sets have at most 20 elements (recall that we supposed that there are no queries 
containing more than k/a elements, since they give no information at all, because 
the answer for them is always NO), thus if n is big, we cannot perform a binary 
search. 

However, if fc > na, then binary search can be used. 

Theorem 1. If a < k/n, then g{n,k,a^m) < [logn] + c, where c depends only 
on a and m, moreover if m — \, then c = 0. 

Proof. We show that binary search can be used to find m defectives. That is, 
first we ask a set F of size Ln/2J and then the underlying set is substituted by F 
if the answer is yes and by F if the answer is NO. We iterate this process until 
the size of the underlying set is at most 2m/ a. Now we check that the condition 
a < k/n remains true after each step. Let n' — \n/2\ be the size of the new 
underlying set and k' be the number of defectives there. If the answer was YES 
, then k' > an', thus a < k' /n'. If the answer was NO, then there are at least 
k — \an'~\ + 1 defectives in the new underlying set, that is fc' > fc — \an'^ + 1 > 
an — \an'] + 1 > an', thus a < k' /n' again. 

Now if TO = 1 we simply continue the binary search until we find a defective 
element, altogether using at most [log n] questions. 

If TO > 1, then we can find m defectives in the last underlying set using at 
most c := uiaxni ^2m/a gin' , m, a, m) further queries. 

(Notice that since the size of the last underlying set is greater than m/a, it 
contains at least m defectives.) This number c does not depend on fc, just on a 
and TO and it is obvious that we used at most [log n] + c queries altogether. □ 

This theorem has an easy, yet very important corollary. If the answer for a 
question A is YES, then there are at least a\A\ defective elements in A. li a\A\ > 
TO, then we can find to of these defectives using f/(|A|, a|A[, a, to) < log \A\ + c 
questions, where c depends only on a and m. Basically it means that whenever 
we obtain a YES answer, we can finish the algorithm quickly. 

The proof of Theorem [1] is based on the fact that if the ratio of the defective 
elements fc/n is at least a, then this condition always remains true during binary 
search. If k/n < a, then this trick does not work, however if the difference 
between k/n and a is small, a similar result can be proved for to = 1. Recall 
that a = [l/aj . 



Theorem 2. If k > ^ - [log andk> 1, then g{n, k, a, 1) < [logn] + 1. 

The proof of the theorem is based on the foUowing lemmas. 

Lemma 1. Let t > be an integer. Then g(2*a, 2* — a, 1) < t + [log a] . 

Proof. We use induction on t. For t = and t = 1 the proposition is true, 
since we can perform a binary search on a or 2a elements (by asking sets of 
size at most a we learn whether they contain a defective element). Suppose now 
that the proposition holds for we have to prove it for t + I. That is, we have 
an underlying set of size 2*+^a containing at least 2*+^ — t — 1 defectives. Our 
first query is a set A of size 2*a. If the answer is yes , then we can continue 
with binary search. If the answer is NO, then there are less than a2*a < 2* 
defectives in A, therefore there are at least 2*+^ — t — 1 — 2* + l = 2* — t 
defectives in A. By the induction hypothesis g{2*a,2* — t,a,l) < t+ [log a] , thus 
S((2*+ia, 2*+i - t - 1, a, 1) < i + 1 + [logo] foUows, finishing the proof of the 
lemma. □ 

Lemma 2. Let t > 2 be an integer. T/ien 5(2*0, 2* — i — 1, a, 1) < i+ [log a] + 1. 

Proof. Let us start with asking three disjoint sets, each of cardinality 2*~^a. If 
the answer to any of these is yes, then we can continue with binary search, using 
t — 2 + [log a] additional questions. If all three answers are NO, then there are 
at least 2* — t — 1 — 3(2*^^ — 1) = 2'^^ — (i — 2) defectives among the remaining 
2*~^a elements, hence we can apply Lemma [TJ □ 

Proof (of Theorem\^. Let us suppose n > 2a (otherwise binary search works) 
and let t = [log r = n — 2*a. We have an underlying set of size n = 2*a + r 
containing at least ^ — [log — 1 defectives. If r = 0, then by Lemma [5] we 
are done. Otherwise let the first query A contain r elements. A positive answer 
allows us to find a defective element by binary search on A using altogether at 
most [logn] +1 questions (actually, at most [logn] questions, because r < n/2). 
If the answer is negative then the new underlying set contains 2*a elements, of 
which more than f - [log -ar-1 = 2*+r/a-ar- [log f J -1 > 2*-[log^J-l 
are defective. Since [log 2.J = t, the number of defectives is at least 2' — t, thus 
by Lemma[T]we need at most t+ [log a] more queries to find a defective element, 
thus altogether we used at most t+l + [log a] < [logn] + 1 queries, from which 
the theorem follows. □ 



One might think that binary search is the best algorithm to find one defective 
if it can be used (i.e. for k > na). A counterexample for k really big is easy to 
give: if A: = n then we do not need any queries and for m = l,fc = n — 1 we need 
just one query. It is somewhat more surprising that g{n,an,a,l) > [logn] is 
not necessarily true. 

For example, the case n = 10, fc = 4, a = 0.4, m = 1 can be solved using 3 
queries: first we ask a set A of size 4. If the answer is YES, we can perform a 
binary search on A, if the answer is NO then there are at least 3 defectives among 



the remaining 6 elements and now we ask a set B of size 2. If the answer is yes 
then we perform a binary search on B, otherwise there are at least 3 defectives 
among the remaining 4 elements, so one query (of size 1) is sufficient to find a 
defective. However, a somewhat weaker lower bound can be proved: 

Theorem 3. g{n, k, a, ra) > [log(n — k + 

We prove the stronger statement that even if one can use any kind of yes-no 
questions, still at least \\og{n — fc + 1)] questions are needed. This is a slight 
generalization of the information theoretic lower bound. 

Theorem 4. To find one of k defective elements from a set of size n, one needs 
[log(n — fc + 1)] yes-no questions in the worst case and this is sharp. 

Proof. Suppose there is an algorithm that uses at most q questions. The number 
of sequences of answers obtained is at most 2^, thus the number of different 
elements selected by the algorithm as the output is also at most 2^. This means 
that n — 2' < fc — 1, otherwise it would be possible that all k defective elements 
are among those ones that were not selected. Thus q > [log(n — fc + 1)] indeed. 

Sharpness follows easily from the simple algorithm that puts fc — 1 elements 
aside and runs a binary search on the rest. □ 

Theorem [3] is an immediate consequence of Theorem HI but this is not true 
for the sharpness of the result. However, Theorem |3] is also sharp: if a < , 
then we can run a binary search on any n — k + 1 of the elements to find a 
defective. 

We have seen in Theorem [T] that if n < fc/a, then binary search works (with 
some additional constant number of questions if m > 1). On the other hand, if 
n goes to infinity (with fc and a fixed), then the best algorithm is linear. 

Theorem 5. For any fc, a, m 

n X ^ 

— h ci < g{n, k,a,m) < — h C2, 

a a 

where ci and ci depend only on k, a, and m. 

Proof. Upper bound: first we partition the underlying set into [^J a-element 
sets and possibly one additional set of less than a elements. We ask each of these 
sets (at most [^J + 1 questions). Then we choose m sets for which we obtained 
a YES answer (or if there are less than m such sets, then we choose all of them). 
We ask every element one by one in these sets (at most ma questions). One can 
easily see that we find at least m defective elements, using at most [-^J +ma+ 1 
questions. 

Lower bound: We use a simple adversary's strategy: suppose all the answers 
are NO and there are m elements identified as defectives. Let us denote the family 
of sets that were asked by J^. It is obvious that those sets of T that have size 
at most a contain no defective elements. Suppose there are i such sets. We use 
induction on i. There are n' > n — ia elements not contained in these sets and 



we should prove that at least ^ + ci— i<^+ci other questions are needed. 
Hence by the induction it is enough to prove the case i = 0. 

Suppose i = 0. If there is a set A of size fc + 1, such that |A n < 1 for all 
F E J', then any fc-element subset of \ A\ can be the set of the defective elements. 
In this case any element can be non-defective, a contradiction. Thus for every 
set A of size fc + 1 there exists a set F € T, such that |v4 n F| > 2. 

Let b — [^J . We know that every set of J- has size at most b. Then a given 

F E F intersects at most ^^^^2 (j) {k+i-j) l)-element sets in at least two 
points. This number is 0(n'^~^), and there are f2(n'''+^) sets of size fc + 1, hence 
= J7(n^) is needed. 

It follows easily that there is an uq, such that if n > uq, then > ^. 
Now let ci — —no/a. If n > ng then |J^| > 2. > R ^ ^i, while if n < uq then 
l-^j > > ^ + Ci, thus the number of queries is at least ^ + Ci, finishing the 
proof. □ 

Remark. The theorem easily follows from Theorem [71 it is included here because 
of the much simpler proof. 



It is easy to give a better upper bound for m = 1. 
Theorem 6. Suppose fc + log fc + 1 < [^] . Then 



n 



g{n,k,a,l) < — — fc+[loga]. 
a 

Proof. First we ask a set X of size ka. If the answer is YES, then we can find 
a defective element in [log fca] steps by Theorem [TJ In this case the number of 
questions used is at most 1 -|- [log fca] = 1 + [log fc + log a] < 1 + [log fc] -|- [log a] < 
Ta 1 ^ k + [log ct] : where the last inequality follows from the condition of the 
theorem. 

If the answer is NO, then we know that there are at most fc — 1 defectives 
in X, so we have at least one defective in X. Continue the algorithm by asking 
disjoint subsets of X of size a, until the answer is YES or we have at most 2a 
elements not yet asked. In these cases using at most [log 2a] questions we can 
easily find a defective element, thus the total number of questions used is at 
most 1^. p »-fca-2a ^ ^ [log 2a] 1+ [^] -fc-2+ [logo] +1 = [f] -fc+ [logo], 
finishing the proof. □ 

Note that if the condition of Theorem [HI does not hold (that is, fc + logfc + 1 > 
[-] ) , then fc > - — [log - J — 1 . hence [log n] + 1 questions are enough by Theorem 

m 

The exact values of g{n,k,a,m) is hard to find, even for m = 1. The al- 
gorithm used in the proof of Theorem [S] seems to be optimal for m = 1 if 
fc -I- log fc -I- 1 < [■^] . However, counterexamples with 1 /a not an integer are easy 
to find (consider i.e. n — 24, fc = 2, a = ^). 

Conjecture 1 If ^ is an integer and fc -|- log fc -|- 1 < [^] , then the algorithm 
used in the proof of Theorem\^ is optimal for m = 1. 



It is easy to see that Conjecture [T] is true for k — 1. For other values of k it 
would follow from the next, more general conjecture. 

Conjecture 2 If ^ is an integer, then g{n, fc, a, 1) < g{n, fc + 1, a, 1) + 1. 

Obviously, Conjecture [2] also fails if 1/a is not an integer. One can see for 
example that g(24, 1, 2/11, 1) = 7 and g(24, 2, 2/11, 1) = 5. 

3 The main theorem 

In this section we prove a lower and an upper bound differing only by a con- 
stant depending only on k. For the lower bound we need the following simple 
generalization of the information theoretic lower bound. 

Proposition 1. Suppose we are given p sets Ai, . . . , Ap of size at least n, each 
one containing at least one defective and an additional set Aq of arbitrary size 
containing no defectives. Let m < p. Then the number of questions needed to 
find at least m defectives is at least \m log n] . 

Proof. Suppose that we are given the additional information that every set Ai 
(i > 1) contains exactly one defective element. Now we use the information 
theoretic lower bound: there are HiLi \ possibilities for the distribution of the 
defective elements at the beginning, and at most Hf^i" I ^^'^ (suppose 

we have found defective elements in every set Ai except in Aj-^ , ■ ■ ■ , ■^jp-m)^ thus 
if we used I queries, then 2' > n™, from which the proposition follows. □ 

Now we formulate the main theorem of the paper. 

Theorem 7. For any k, a, m 

n n 

h mloga — ci(fc) < g{n, k,a,m) < h to log a + C2(fc), 

a a 

where ci(fc) and C2{k) depend only on k. 

Proof. First we give an algorithm that uses at most + to log a + C2(fc) queries, 
proving the upper bound. In the first part of the procedure we ask disjoint sets 
Ai, A2, . . . , Aj, of size a until either there were m YES answers or there are no 
more elements left. In this way we ask at most [^] questions. 

Suppose we obtained YES answers for the sets Ai,A2,... Ami ^^'^ answers 
for the sets Ami+i, . . . , Ar- If toi > to, then in the second part of the procedure 
we use binary search in the sets Ai, A2, . . . , Am in order to find one defective 
element in each of them. For this we need to [log a] more questions. 

If TOi < TO, then first we use binary search in the sets Ai, A2, . . . , Ami in 
order to find defective elements ai € Ai,a2 £ A2, . . . ,ami G Ami- Then we 
iterate the whole process using 5*1 = U^^Ai \ {oi} as an underlying set, that is 
we ask disjoint sets Bi, B2, . . . , Bt of size a until either we obtain m — mi YES 
answers or there are no more elements left. Suppose we obtained YES answers 



for the sets Bi,B2, - ■ ■ B„i^ and NO answers for the sets Am^+i, . . . ,At. If m2 > 
m — mi, then in the second part of the procedure we use binary search in the sets 
Bi, i?2, . . • , -Bm-mi in Order to find one defective element in each of them, while 
if m2 < m — mi, then first we use binary search in the sets Bi,B2, ■ ■ ■ , Bra^ in 
order to find defective elements hi S Bi,b2 S B2, ■ ■ ■ , G and continue 
the process using 5*2 = U™\i?i \ as an underlying set, and so on, until we 
find m = nil + to2 + • • • + "nij defective elements. Note that > 1, Vi < j, 
since k > m. We have two types of queries: queries of size a and queries of size 
less than a (used in the binary searches). The number of questions of size a is 
at most 1 in the first part and at most mi + 1712 + ■ ■ ■ + nij^i < m < fc in the 
second part. The total number of queries of size less than a is at most mfloga], 
thus the total number of queries is at most [^] +mfloga] +k, proving the upper 
bound. 

To prove the lower bound we need the following purely set-theoretic lemma. 

Lemma 3. Let k,l,a be arbitrary positive integers and /3 > 1. Let now "H be 
a set system on an underlying set S of size c{k,l,(3) ■ a = A:/3(2'^' — l)a, such 
that every set of H has size at most /3a and every element of S is contained in 
at most I sets of H. Then we can select k disjoint subsets of S (called heaps) 
Ki,K2, . . ■ , Kk of size (3a, such that every set ofH intersects at most one heap. 

Proof. Let us partition the underlying set into k heaps of size /3a(2*^' — 1) in an 
arbitrary way. Now we execute the following procedure at most kl — 1 times, 
eventually obtaining k heaps satisfying the required conditions. In each iteration 
we make sure that the members of a subfamily H' of H will intersect at most 
one heap at the end. 

In each iteration we do the following. We build the subfamily 7^' C "H by 
starting from the empty subfamily and adding an arbitrary set of H to our 
subfamily until there exists a heap Ki such that \Ki f) UnewH] > \Ki\/2, that 
is Ki is at least half covered by TL'. We call Ki the selected heap. If the half 
of several heaps gets covered in the same step, then we select one where the 
difference of the number of covered elements and the half of the size of the heap 
is maximum. 

Now we keep the covered part of the selected heap and keep the uncovered 
part of the other heaps and throw away the other elements. We also throw away 
the sets of the subfamily H' from our family TL, as we already made sure that 
the members of H' will not intersect more than one heap at the end. In this way 
we obtain smaller heaps but we only have to deal with the family 'H\'H' . 

We prove by induction that after s iterations all heaps have size at least 
j3a{2'^''~^ — 1). This trivially holds for s = 0. By the induction hypothesis, the 
heaps had size at least (3a{2^^~^^^ — 1) before the sth iteration step. After the sth 
step the new size of the selected heap K is at least \K\/2 > /3a(2'°'~'*+^ — l)/2 > 
j3a{2'^''~^ — 1). Now we turn our attention to the unselected heaps. Suppose 
the set we added last to %' is the set /. Clearly, \Kj n yjHe'H'\{i}H\ < \Kj\/2 
for all j. Let K be the selected heap and Ki be an arbitrary unselected heap. 
Now by the choice of K we have \Ki n UHewH] < \Ki\/2 + \I\/2, otherwise 



\K^r\yjHewH\ + \Kf^UHewH\ > |Xj|/2 + |if |/2 + |/|, which is impossible, since 
\K,r\i^HeH'H\ + \Kr^\JHen-H\ = \iiK,UK)nUHe'H'\{i}H)U{{K,UK)nl)\ < 
\K,\/2+\K\/2+\I\. 

Now since |/| < (3a, the new size of the unselected heap Ki is \K^\ = \Ki \ 
UHewH\ > \K,\/2-Pa/2 > ^a(2'='-"+i - l)/2-/3a/2 > /3a(2'='~'' - 1), finishing 
the proof by induction. 

Now in each iteration we delete a family that covers the selected heap, thus 
any heap can be selected at most / times, since every element is contained in at 
most I sets. After kl — 1 iterations the size of an arbitrary heap will be still at 
least l3a. Furthermore, all but one heaps were selected exactly / times, thus any 
remaining set of "H can only intersect the last heap. That is, heaps at this point 
satisfy the required condition for all sets of T-L. 

If we can iterate the process at most kl — 2 times, then after the last possible 
iteration more than half of any heap is not covered by the union of the remaining 
sets. Deleting the covered elements from each heap we obtain heaps of size at 
least f3a that satisfy the condition. □ 

Now we are in a position to prove the lower bound of Theorem [71 We use 
the adversary method, i.e. we give a strategy to the adversary that forces the 
questioner to ask at least ^ + mloga — ci(fc) questions to find m defective 
elements. 

Recall that all questions have size at most [fc / aj and now the adversary gives 
the additional information that there are exactly k defective elements. 

During the procedure, the adversary maintains weights on the elements. At 
the beginning all elements have weight 0. Let us denote the set of the possible 
defective elements by S' . At the beginning S' = S. At each question A the 
strategy determines the answer and also adds appropriate weights to the elements 
of A. If a question A is of size at most a — [1/aJ, then the answer is NO and 
weight 1 is given to all elements of A. If |^| > a, the answer is still NO and weight 
a/lk/a\ is given to the elements of A. Thus after some r questions the sum of 
the weights is at most ra. If an element reaches weight 1, then the adversary says 
that it is not defective, and the element is deleted from S' . The adversary does 
that until there are still ca elements in S" but in the next step S' would become 
smaller than this threshold (the exact value of c will be determined later) . Up to 
this point the number of elements thrown away is at least n — ca — [fc/aj , thus 
the number of queries is at least ^ — c — [k/a\/a > ^ — c — k. 

Let the set system T consist of the sets that were asked up to this point and 
let = {FnS' \ F £T, \F\ > a}. 

The following observations are easy to check. 

Lemma 4. 

— \S'\ > ca. 

— Every set F E T' has size at most lk/a\ < k{a + I) < 2ka 

— Every element of S' is contained in at most [fc/aj/a < fc(l + 1/a) < 2k sets 
of P. 



— Every k-set that intersects each F ^ J^' in at most one element is a possible 
set of defective elements. 

Now let I := 2fc, /S := 2k, and c := c{k,l,l3) = kl3{2''' - 1) = 2k^{2^''^ - 1). 
By the observations above, we can apply Lemma |3] with 1-1 = 7-'. The lemma 
guarantees the existence of heaps Ki,K2, . . . , Kk of size /3a > a, such that every 
transversal of the Ki's is a possible fc-set of defective elements. Now by applying 
Proposition [T] with Ai = Ki and Aq = S \ S' , we obtain that the questioner 
needs to ask at least \m log a] more queries to find m defective elements. 

Altogether the questioner had to use at least — c— fc+m log a queries, which 
proves the lower bound, since the number c depends only on k (the constant in 
the theorem is ci{k) = c -\- k). □ 

The constant in the lower bound is quite large, by a more careful analysis 
one might obtain a better one. For example, we could redefine the weights, such 
that we give weight a/|j4| to the elements of A, thus still distributing weight at 
most a per asked set. 

It is also worth observing that if l/a is an integer, then we can use Lemma 
[2]with I = [3 = k, instead oi I = /3 = 2k. This way one can prove stronger results 
for small values of k and to if l/a is an integer. We demonstrate it for fc = 2 in 
the next section. The following claim is easy to check. 

Claim. Let H be a set system on an underlying set S of size 3a, consisting of 
disjoint sets of size at most 2a. Then we can select 2 disjoint subsets of S (called 
heaps) Ki, K2 of size at least a, such that every set of T-L intersects at most one 
heap. 

4 The case k = 2 

In this section we determine the exact value g{n,2,a, 1). Let 6 = [2{i}J, where 
{x} denotes the fractional part of x. 

Consider the following algorithm W, where n denotes the number of remain- 
ing elements: 

If n < 2r'°s"l + 1, we ask a question of size [n/2j < a, then depending on 
the answer we continue in the part that contains at least one defective element, 
and find that with binary search. 

If 2^ioga] + 2 < n < 2ri°s'^l+i + 1, then we ask a question of size 21"'°^"! + 1 
(this falls between a and 2a + 1). If the answer is yes, we put an element aside 
and continue with the remaining elements of the set we asked, otherwise we 
continue with the elements not in the set we asked. This way independent of 
whether we got a yes or NO answer, we have at most 2r'°s'^l elements with at 
least one defective, hence we can apply binary search. 

If 2ri°g'»l+i + 2<n<3a + S-\- 2ri°s°l , then first we ask a question of size 
2a + S. If the answer is yes, we put an element aside and continue with the 
remaining elements of the set we asked, otherwise we continue with the elements 
not in the set we asked. This way independent of whether we got a YES or NO 



answer, we have at most 2^^°^°-'^ + a elements with at least one defective. We 
continue with a set of size a, and after that we can finish with binary search. 

If n > 3a + 6 + 2r'°s°l + 1, then we ask a question of size a. If the answer is 
NO, we proceed as above. If the answer is YES, we can find a defective element 
with at most [logo] further questions. 

Counting the number of questions used in each case, we can conclude. 

Claim. If n < 3a+(5+2r'°s°l , then algorithm W takes only [log(n— 1)] questions, 
thus according to Theorem S] it is optimal. 

In fact a stronger statement is true. Note that the following theorem does 
not contradict to Conjecture [TJ as the algorithm mentioned there uses the same 
number of steps as algorithm W in case /c = 2, l/a is an integer and [n/a] > 4. 

Theorem 8. Algorithm W is optimal for any n. 

Proof. We prove a slightly stronger statement, that algorithm W is optimal even 
among those algorithms that have access to an unlimited number of extra non- 
defective elements. This is crucial as we use induction on the number of elements, 
n. 

It is easy to check that the answer for a set that is greater than 2a + S is 
always no, while if both defective elements are in a set of size 2a + S, then the 
answer is YES. We say that a question is small if its size is at most a, and big if 
its size is between a + 1 and 2a + 6. Note that small questions test if there is at 
least one defective element in the set, while big questions test if both defective 
elements are in the set. Suppose by contradiction that there exists an algorithm 
Z that is better than W, i.e. there is a set of elements for which Z is faster than 
W. Denote by n the size of the smallest such set and by z{n) the number of steps 
in algorithm Z. We will establish through a series of claims that such an n cannot 
exist. It already follows from Claim[4]that n has to be at least 3a + 5 + 2 r'°s ^1 

Note that for n = 3a + d + 2^^°^ "'^ algorithm W uses [log(n- 1)] [log(2a + 
6 — 1)1+1 questions. An important tool is the following lemma. 

Lemma 5. If n > 3a + S + 2''^°^°''^ + 1, then algorithm Z has to start with a big 
question. Moreover, it can ask a small question among the first z{n) — [log(2a + 
(5 — 1)] questions only if one of the previous answers was YES. 

Proof. First we prove that algorithm Z has to start with a big question. Suppose 
it starts with a small question. We show that in case the answer is NO, it cannot 
be faster than algorithm W. In this case after the first answer there are at least 
n — a (and at most n — 1) elements which can be defective, and an unlimited 
number of non-defective elements, including those which are elements of the first 
question. By induction algorithm W is optimal in this case, and one can easily 
see that it cannot be faster if there are more elements, hence algorithm Z cannot 
be faster than algorithm W on n — a elements plus one more question. On the 
other hand algorithm W clearly uses this many questions (as it starts with a 
question of size a), hence it cannot be slower than algorithm Z. 



Similarly, to prove the moreover part, suppose that the first z{n) — [log(2a + 
6 — 1)] answers are NO and one of these questions, A is small. Let us delete every 
element of A. By induction algorithm W is optimal on the remaining at least n—a 
elements, hence similarly to the previous case, algorithm Z uses more questions 
than algorithm W on ?i — a elements, hence cannot be faster than algorithm 
W. More precisely, we can define algorithm W, which starts with asking A, and 
after that proceeds as algorithm W. One can easily see that algorithm W cannot 
be slower than algorithm Z or faster than algorithm W. □ 

Note that a yes answer would mean that [log(2a + (5 — 1)] further questions 
would be enough to find a defective with binary search, hence in the worst 
case, (when the most steps are needed) no such answer occurs among the first 
z{n) — [log(2a + (5 — 1)] questions anyway. Now we can finish the proof of the 
theorem with the following claim. 

Claim. lin> 3a + S + 2r'°s"l , then algorithm W is optimal. 

Proof. If not, then the smallest n for which W is not optimal must be of the form 
2a + 5 + 2ri°g°T + za+1, where z >l integer. (This follows from the fact that 
the number of required questions is monotone in n if we allow the algorithm 
to have access to an unlimited number of extra non-defective elements.) By 
contradiction, suppose that algorithm Z uses only [log(2a + (5 — 1)] +z questions. 
Suppose the answer to the first z questions are NO. Then by to Lemma [SI these 
questions are big. Suppose that the z + 1st answer is also NO. We distinguish 
two cases depending on the size of the z + 1st question A. In both cases we will 
use reasoning similar to the one in Theorem [5] 

Case 1. The z + 1st question is small. After the answer there are [log(2a + 
S — — 1 questions left, so depending on the answers given to them, any 
deterministic algorithm can choose at most 2r'°s(2a+<5-i)l-i elements. Hence 
algorithm Z gives us after the z + 1st answer a set B of at most 2^^°si'^°'+^-i)]-i 
elements, which contains a defective. 

Before starting the algorithm, all the (2) pairs are possible candidates to 
be the set of defective elements. However, after the 2 + 1st question (knowing 
the algorithm) the only candidates are those which intersect B. The z + 1st 
question shows at most a non-defective elements, but all the pairs which intersect 
neither A nor B have to be excluded by the first z questions. Thus (^""I^^J"!-^!^ > 
(2a+5+2r--i+.a+i-a-2r'-(-+-i)i-^ > ((^+1)^+^+1) pairs should be excluded, 

but z questions can exclude at most z(^'^^^^ pairs, which is less if z > 1. 

Case 2. The z + 1st question is big. After it we have [log(2a -I- (5 — 1)] — 1 
questions left, so depending on the answers given to them, any deterministic algo- 
rithm can choose at most 2r'°s(2a+<5-i)l-i elements. This means that we have to 
exclude with the first z+1 questions at least (^''+*+^ +za+^L~z ^ ^ 

^(^+2)a+5+ij pg^-j.g^ ^j^gy exclude at most (z + 1)(^''2^*) pairs, which is 
less if z > 1. □ 



This finishes the proof of the theorem. 



□ 



5 Open problems 



It is quite natural to think that g{n, k, a, m) is increasing in n but we did not 
manage to prove that. The monotonicity in k and m is obvious from the defi- 
nition. On the other hand, we could have defined g(n,k^a,m) as the smallest 
number of questions needed to find m defectives assuming there are exactly k 
defectives (instead of at least k defectives) among the n elements, in which case 
the monotonicity in k is far from trivial. We conjecture that this definition gives 
the same function as the original one. 

It might seem strange to look for monotonicity in a. but we have seen that 
for m = 1 we can reach the information theoretic lower bound (which is [log(7i — 
/c + l)] in this setting) for a < 2/(n — fc + l). All the theorems from Section [5] also 
suggest that the smaller a is, the faster the best algorithm is even for general 
TO. Basically in case of a NO answer it is better if a is small, and in case of a 
YES answer the size of a does not matter very much, since the process can be 
finished fast. However, we could only prove Theorem [7] concerning this matter. 

Another interesting question is if we can choose a. li m — 1 then we should 
choose Q!<l/(n — fc + l), and as we have mentioned in the previous paragraph, 
we believe that a small enough a is the best choice. 

Another possibility would be if we were allowed to choose a new a for every 
question. Again, we believe that the best solution is to choose the same, small 
enough a every time. This would obviously imply the previous conjecture. 

Finally, a more general model to study is the following. We are given two 
parameters, a > /?. If at least an a fraction of the set is defective, then the 
answer is yes, if at most a (3 fraction, then it is NO, while in between the answer 
is arbitrary. With these parameters, this paper studied the case a — (3. This 
model is somewhat similar to the threshold testing model of where instead 
of ratios a and /3 they have fixed values a and b as thresholds. 
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